Goto

Collaborating Authors

 distribution model



EcoCast: A Spatio-Temporal Model for Continual Biodiversity and Climate Risk Forecasting

Akande, Hammed A., Gidado, Abdulrauf A.

arXiv.org Machine Learning

Increasing climate change and habitat loss are driving unprecedented shifts in species distributions. Conservation professionals urgently need timely, high-resolution predictions of biodiversity risks, especially in ecologically diverse regions like Africa. We propose EcoCast, a spatio-temporal model designed for continual biodiversity and climate risk forecasting. Utilizing multisource satellite imagery, climate data, and citizen science occurrence records, EcoCast predicts near-term (monthly to seasonal) shifts in species distributions through sequence-based transformers that model spatio-temporal environmental dependencies. The architecture is designed with support for continual learning to enable future operational deployment with new data streams. Our pilot study in Africa shows promising improvements in forecasting distributions of selected bird species compared to a Random Forest baseline, highlighting EcoCast's potential to inform targeted conservation policies. By demonstrating an end-to-end pipeline from multi-modal data ingestion to operational forecasting, EcoCast bridges the gap between cutting-edge machine learning and biodiversity management, ultimately guiding data-driven strategies for climate resilience and ecosystem conservation throughout Africa.


BATIS: Bayesian Approaches for Targeted Improvement of Species Distribution Models

Villeneuve, Catherine, Akera, Benjamin, Teng, Mélisande, Rolnick, David

arXiv.org Artificial Intelligence

Species distribution models (SDMs), which aim to predict species occurrence based on environmental variables, are widely used to monitor and respond to biodiversity change. Recent deep learning advances for SDMs have been shown to perform well on complex and heterogeneous datasets, but their effectiveness remains limited by spatial biases in the data. In this paper, we revisit deep SDMs from a Bayesian perspective and introduce BATIS, a novel and practical framework wherein prior predictions are updated iteratively using limited observational data. Models must appropriately capture both aleatoric and epistemic uncertainty to effectively combine fine-grained local insights with broader ecological patterns. We benchmark an extensive set of uncertainty quantification approaches on a novel dataset including citizen science observations from the eBird platform. Our empirical study shows how Bayesian deep learning approaches can greatly improve the reliability of SDMs in data-scarce locations, which can contribute to ecological understanding and conservation efforts.


Modeling Headway in Heterogeneous and Mixed Traffic Flow: A Statistical Distribution Based on a General Exponential Function

Leungbootnak, Natchaphon, Li, Zihao, Wei, Zihang, Lord, Dominique, Zhang, Yunlong

arXiv.org Artificial Intelligence

The ability of existing headway distributions to accurately reflect the diverse behaviors and characteristics in heterogeneous traffic (different types of vehicles) and mixed traffic (human-driven vehicles with autonomous vehicles) is limited, leading to unsatisfactory goodness of fit. To address these issues, we modified the exponential function to obtain a novel headway distribution. Rather than employing Euler's number (e) as the base of the exponential function, we utilized a real number base to provide greater flexibility in modeling the observed headway. However, the proposed is not a probability function. We normalize it to calculate the probability and derive the closed-form equation. In this study, we utilized a comprehensive experiment with five open datasets: highD, exiD, NGSIM, Waymo, and Lyft to evaluate the performance of the proposed distribution and compared its performance with six existing distributions under mixed and heterogeneous traffic flow. The results revealed that the proposed distribution not only captures the fundamental characteristics of headway distribution but also provides physically meaningful parameters that describe the distribution shape of observed headways. Under heterogeneous flow on highways (i.e., uninterrupted traffic flow), the proposed distribution outperforms other candidate distributions. Under urban road conditions (i.e., interrupted traffic flow), including heterogeneous and mixed traffic, the proposed distribution still achieves decent results.



Practical Equivalence Testing and Its Application in Synthetic Pre-Crash Scenario Validation

Wu, Jian, Sander, Ulrich, Flannagan, Carol, Zhao, Minxiang, Bärgman, Jonas

arXiv.org Artificial Intelligence

The use of representative pre-crash scenarios is critical for assessing the safety impact of driving automation systems through simulation. However, a gap remains in the robust evaluation of the similarity between synthetic and real-world pre-crash scenarios and their crash characteristics. Without proper validation, it cannot be ensured that the synthetic test scenarios adequately represent real-world driving behaviors and crash characteristics. One reason for this validation gap is the lack of focus on methods to confirm that the synthetic test scenarios are practically equivalent to real-world ones, given the assessment scope. Traditional statistical methods, like significance testing, focus on detecting differences rather than establishing equivalence; since failure to detect a difference does not imply equivalence, they are of limited applicability for validating synthetic pre-crash scenarios and crash characteristics. This study addresses this gap by proposing an equivalence testing method based on the Bayesian Region of Practical Equivalence (ROPE) framework. This method is designed to assess the practical equivalence of scenario characteristics that are most relevant for the intended assessment, making it particularly appropriate for the domain of virtual safety assessments. We first review existing equivalence testing methods. Then we propose and demonstrate the Bayesian ROPE-based method by testing the equivalence of two rear-end pre-crash datasets. Our approach focuses on the most relevant scenario characteristics. Our analysis provides insights into the practicalities and effectiveness of equivalence testing in synthetic test scenario validation and demonstrates the importance of testing for improving the credibility of synthetic data for automated vehicle safety assessment, as well as the credibility of subsequent safety impact assessments.


CISO: Species Distribution Modeling Conditioned on Incomplete Species Observations

Abdelwahed, Hager Radi, Teng, Mélisande, Zbinden, Robin, Pollock, Laura, Larochelle, Hugo, Tuia, Devis, Rolnick, David

arXiv.org Artificial Intelligence

Species distribution models (SDMs) are widely used to predict species' geographic distributions, serving as critical tools for ecological research and conservation planning. Typically, SDMs relate species occurrences to environmental variables representing abiotic factors, such as temperature, precipitation, and soil properties. However, species distributions are also strongly influenced by biotic interactions with other species, which are often overlooked. While some methods partially address this limitation by incorporating biotic interactions, they often assume symmetrical pairwise relationships between species and require consistent co-occurrence data. In practice, species observations are sparse, and the availability of information about the presence or absence of other species varies significantly across locations. To address these challenges, we propose CISO, a deep learning-based method for species distribution modeling Conditioned on Incomplete Species Observations. CISO enables predictions to be conditioned on a flexible number of species observations alongside environmental variables, accommodating the variability and incompleteness of available biotic data. We demonstrate our approach using three datasets representing different species groups: sPlotOpen for plants, SatBird for birds, and a new dataset, SatButterfly, for butterflies. Our results show that including partial biotic information improves predictive performance on spatially separate test sets. When conditioned on a subset of species within the same dataset, CISO outperforms alternative methods in predicting the distribution of the remaining species. Furthermore, we show that combining observations from multiple datasets can improve performance. CISO is a promising ecological tool, capable of incorporating incomplete biotic information and identifying potential interactions between species from disparate taxa.


Causal Mediation Analysis with Multiple Mediators: A Simulation Approach

Zhou, Jesse, Wodtke, Geoffrey T.

arXiv.org Machine Learning

Analyses of causal mediation often involve exposure-induced confounders or, relatedly, multiple mediators. In such applications, researchers aim to estimate a variety of different quantities, including interventional direct and indirect effects, multivariate natural direct and indirect effects, and/or path-specific effects. This study introduces a general approach to estimating all these quantities by simulating potential outcomes from a series of distribution models for each mediator and the outcome. Building on similar methods developed for analyses with only a single mediator (Imai et al. 2010), we first outline how to implement this approach with parametric models. The parametric implementation can accommodate linear and nonlinear relationships, both continuous and discrete mediators, and many different types of outcomes. However, it depends on correct specification of each model used to simulate the potential outcomes. To address the risk of misspecification, we also introduce an alternative implementation using a novel class of nonparametric models, which leverage deep neural networks to approximate the relevant distributions without relying on strict assumptions about functional form. We illustrate both methods by reanalyzing the effects of media framing on attitudes toward immigration (Brader et al. 2008) and the effects of prenatal care on preterm birth (VanderWeele et al. 2014).


Two-parameter superposable S-curves

S, Vijay Prakash

arXiv.org Artificial Intelligence

Straight line equation $y=mx$ with slope $m$, when singularly perturbed as $ay^3+y=mx$ with a positive parameter $a$, results in S-shaped curves or S-curves on a real plane. As $a\rightarrow 0$, we get back $y=mx$ which is a cumulative distribution function of a continuous uniform distribution that describes the occurrence of every event in an interval to be equally probable. As $a\rightarrow\infty$, the derivative of $y$ has finite support only at $y=0$ resembling a degenerate distribution. Based on these arguments, in this work, we propose that these S-curves can represent maximum entropy uniform distribution to a zero entropy single value. We also argue that these S-curves are superposable as they are only parametrically nonlinear but fundamentally linear. So far, the superposed forms have been used to capture the patterns of natural systems such as nonlinear dynamics of biological growth and kinetics of enzyme reactions. Here, we attempt to use the S-curve and its superposed form as statistical models. We fit the models on a classical dataset containing flower measurements of iris plants and analyze their usefulness in pattern recognition. Based on these models, we claim that any non-uniform pattern can be represented as a singular perturbation to uniform distribution. However, our parametric estimation procedure have some limitations such as sensitivity to initial conditions depending on the data at hand.


MaskSDM with Shapley values to improve flexibility, robustness, and explainability in species distribution modeling

Zbinden, Robin, van Tiel, Nina, Sumbul, Gencer, Vanalli, Chiara, Kellenberger, Benjamin, Tuia, Devis

arXiv.org Artificial Intelligence

Species Distribution Models (SDMs) play a vital role in biodiversity research, conservation planning, and ecological niche modeling by predicting species distributions based on environmental conditions. The selection of predictors is crucial, strongly impacting both model accuracy and how well the predictions reflect ecological patterns. To ensure meaningful insights, input variables must be carefully chosen to match the study objectives and the ecological requirements of the target species. However, existing SDMs, including both traditional and deep learning-based approaches, often lack key capabilities for variable selection: (i) flexibility to choose relevant predictors at inference without retraining; (ii) robustness to handle missing predictor values without compromising accuracy; and (iii) explainability to interpret and accurately quantify each predictor's contribution. To overcome these limitations, we introduce MaskSDM, a novel deep learning-based SDM that enables flexible predictor selection by employing a masked training strategy. This approach allows the model to make predictions with arbitrary subsets of input variables while remaining robust to missing data. It also provides a clearer understanding of how adding or removing a given predictor affects model performance and predictions. Additionally, MaskSDM leverages Shapley values for precise predictor contribution assessments, improving upon traditional approximations. We evaluate MaskSDM on the global sPlotOpen dataset, modeling the distributions of 12,738 plant species. Our results show that MaskSDM outperforms imputation-based methods and approximates models trained on specific subsets of variables. These findings underscore MaskSDM's potential to increase the applicability and adoption of SDMs, laying the groundwork for developing foundation models in SDMs that can be readily applied to diverse ecological applications.